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ABSTRACT 

This paper describes how we have implemented a scan line 
graphics generation algorithm on the Massively Parallel Proces- 
sor (MPP). Pixels are compute in parallel and their results are 
applied to the Z buffer in large groups. To perform pixel value 
calculations, facilitate load balancing across the processors and 
apply the results to the Z buffer efficiently in parallel requires 
special virtual routing (sort computation 1 ' 2 ) techniques devel- 
oped by the author especially for use on single-instruction 
multiple-data (SIMD) architectures. 
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INTRODUCTION 

A scan line graphics generation algorithm basically determines 
the brightness of pixels in a simulated 3-D scene a scan line at a 
time. The brightness value of a pixel is based on the surface 
brightness of simulated polygon which would be seen through 
the pixel. Triangles are the polygons used here. Only the triangle 
nearest the pixel on the simulated viewing screen will be seen 
through the pixel. Therefore a Z buffer is setup to accumulate the 
values of the closest polygons to the viewing screen. It is actually 
not necessary to only process one scan line at a time. On the MPP, 
a subset of triangles at a time are processed for all scan lines that 
these triangles cover. This is done by projecting each triangle 
onto the viewing screen and determining which scan lines it 
covers. Then the pixels of each scan line that the triangle covers 
is determined. This results in pixels of different values and 
distances from the viewing screen which are loaded into the Z 
buffer. When all triangles are processed the Z buffer can be 
displayed as an image. 

To efficiently compute pixel values in parallel an efficient load 
balancing method was developed so as many processors as 
possible could be kept busy. This is of importance when greater 
parallelism can be realize by duplicating data into more proces- 
sors. This is made complex when it is determined that data in 
certain processors is of no more computational use, randomly 


leaving processors without work to do. Therefore the data must 
be moved in such a way that it is known that when the data is slid 
to new processors it will not be written over useful data. This 
movement or compression is done by sorting. Although effi- 
ciency of processor usage is of primary interest here, efficiency 
of data movement is also of importance. Therefore the ineffi- 
ciencies in the use of sorting are also considered. This has 
prompted the modification of the sorts used. This involves a 
preprocessing (scout) step which determines how much of the 
sort is necessary to provide sufficient contiguous space to dupli- 
cate the data. Once this has been determined a sort is used to 
compress the data which can be terminated early based on the 
information derived by the scout step. This then allows one the 
ability to reasonably efficiently keep as many processors as 
possible busy. 

PROJECTION CALCULATION 

The projection calculation converts the three coordinates of the 
three comers of a triangle in a 3 dimensional viewing space into 
two coordinates on the viewing screen and a range from the view 
point. Given the coordinates of the triangle ( X, f Y, f Z,, X 2 , Y 2 , 
Zj, X,, Y 3 , Zj ), the coordinates of the view point (X v , Y v , Zy), and 
the projected coordinates ( X’,, Y’ ( , R,, X’ 2 , Y* 2 , R*, X 3 , Y ,, R, 
) the following equations do the conversion from 3-D coordinates 
to 2-D projected coordinates. The first set of equations rotates the 
triangles in space so that the viewing axis lines up with the Z axis. 
Thus the view point will lie along the Z axis. 

X” = X, * x y + Z, * z y , Y” = Yj , 

Z” = Xj * z v - Z^x,, 

X" v =0, Y M V =Y V , and Z" v =V Y^+zj, 
where x y , y„, and z v are normalized values of X^ Y v , and Zy. 

x”\ = x”, r”=Y”*y’\ + zvz’\, 

7T\ ■ Y” * z” y - T\ * y '\ » 

where y” y and z” y are normalized values of Y” v , and Z v . 
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Thus the rotated coordinates of a triangle is X’” Y m 7”» y»»» 

Y”» 7”» V’” j rr>» t, l* l» A 2* 

j> j> A j- Y ,, and Z ,. The rotated triangles are 
projected on to the screen which is the distance R from the view 
point. The following equations give the values forX’, Y\ andR’ 
for each comer of a triangle. 


R v =Vx v 2 + Y v 2 + Z ¥ 2 . X=X"£i, Y=Y"£i, 
and R'=V r (X’"-X v ) 2 +(r"-Y v ) 2 +(Z"’-Z v ) 2 . 


A brightness value (B) is also calculated for each triangle. The 
actual means of calculating it is not important, only that it exists 
and must be included with the rest of the information for each 
triangle. 


SCAN LINE DETERMINATION 

Once the projection calculations have been performed each 
triangle will be described by an X and Y coordinate and a range 
for each corner and a brightness for the entire triangle. This 
information will make up a triangle description record. These 
records will be duplicated so that there exists one copy of a 
tnangle’s description record for each scan line that intersects the 
triangle s projection onto the screen. 


Assume that scan lines are parallel to the X axis. Then the comers 
of a triangle with the largest and smallest Y values define the 
range of scan lines that the triangle intersects. By recursively 
dividing this range in half and making records corresponding to 
die two halves, we will eventually have a record for each scan line 
m the range. The difficulty arises when this has to be done in 
parallel, especially when it is done on a large array of processors 
ike the MPP. The number of scan lines that a triangle overlaps 
is not the same for all triangles. This means that the rate of 
creauon of newrecords is uneven across the processors and some 
sort of load balancing must be performed if one is to efficiently 
utilize large arrays of processors. 

LOAD BALANCING 


Load balancing consists of redistributing records across the 
processors when some processors contain more than one record. 
Tnis is caused by creating more records in one area of the array 
of processors than in others. One can do this by moving all the 
records to one end of the array of processors, only one record per 
processor. Any left over records, if all processors have at least 
one record, can be saved in a stack. There several means by 
which the records can be moved(compressed) to one end of the 
array, but we have found that parallel bitonic sort is very efficient 
at doing thison the MPP. So the useof son to load balance is what 
will be discussed here. 


Actually the records are sorted toone end of the array so that there 
are two records per processor. Therefore if only half of the 
processors have any record in them, then half must have none. 


The final step of the load balancing is to move one record from 
each processor thathas two to a processor that has none by sliding 

hasmlfZ aC, T ™ S means a «*»!** w 

has to be done and the data moved halfway across the array 
Though the sort is efficient, there is no sense in doing a complete 
one if one doesn’t have to. P 

Therefore a scouting step was developed to determine how 
much of the sort needs to be performed so that records can be 
simply moved to empty processors. Simply implies moving one 
record foim each processor that has two to a processor that has 
none by moving them all the same number of processors away 
torm their original processor. 

For an incomplete sort to be useful at least the following condi- 
on must be true. That forevery group of processors, at least half 

sam^ mus * ^ emp ty. These groups must contain the 

same number of processors and all records within each group 
must be compressed to the same side of the array of processor 
of the group. The scout routine determines the shortest sort 

“rr mcct thcse conditions by performing a sort on a set 
of flags that represent where the records exist within the array 
The difference from the sort being that after eveiy merge step it 
checks to see if the required conditions have been meet. 

Scan line determination is merely duplication of records, modi- 
ficauon so that they represent different ranges of scan lines, and 
redistribution of records (load balancing). This is repeated until 

each record represents only on scan line. 

PIXEL DETERMINATION 


At this point each record represents a triangle and one scan line 
that it intersects. The range along the scan line which represents 
the part of the scan line that is covered by the triangle is 
determined. Then in the same way that scan lines ranges were 
reduced to individual scan lines, so pixel ranges are reduced to 
individual pixels. Analogous to scan line determination, pixel 
determination involves duplication of records, modification so 
that they represent different ranges of pixels, and redistribution 
of records (load balancing). Thus, each record will represent a 
mangle and a pixel that it covers. From each of these records a 
pixel record is created that contains the pixels location on the 
screen, the bright of the triangle, and the distance to the triangle 
as seen through the pixel. 

Z BUFFERING 

Many of the pixel records will represent the same pixel, but with 
different range and brightness and range values. The Z buffer is 
merely a collection of the records for which duplicate pixel 
records are eliminated. They are eliminated based on there range 
value. Only the pixel record with the smallest range is kept for 
each pixel. This is done using a sort computation function, sort 
minimum, which will flag the minimum range record for each 
pixel during the sort. All unflagged records can be mark as 
deleted. 
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IMAGE ASSEMBLY 

The records in the Z buffer are then used to form a final image. 
Techniques for assembling data points into an image were 
developed previously in the process of developing algorithms of 
point plotting and raytracing on the MPP 2 3 . 

Since there may not be a Z buffer record for every pixel in the 
image, a template image must be created. This consists of a group 
of pixel records that contain a record for every pixel in the image. 
Image assembly is a two step operation, pixel value distribution 
and image organization. Both of these operation can be done 
with sort computation functions. Pixel value distribution is done 
with sort distribution. Z buffer records are flagged as containing 
valid data and image template records are not. Sort distribution 
copies data from Z buffer records to image template records. This 
however leaves Z buffer records interspersed with image tem- 
plate recotds. Thus the image can not be displayed in this form 
as is. Since image records are flagged as belonging to the image 
template and Z buffer records are not, the records can be sort with 
the image flag as the major key. This will separate the Z buffer 
records from the template records. At the same time the pixel 
location can be used as the minor key , which will order the pixels 
so that they can be displayed as a raster scan image. 

CONCLUSION 

This technique is in use on the MPP, which is a 2-D grid of 128 
by 128 processors. We are generating 3-D renderings of eleva- 
tion data. The data consists of a 512 by 512 grid of points which 
is converted into 524,288 triangles (see Color Plate II, p. 694). 
These triangles take from 45 seconds to 75 seconds to render, 
which is from 6 to 12 thousand triangles a second. Currently we 
are working on more efficient means of data movement and or- 
ganization to increase its speed. 

REFERENCES 

1 Dorband, John E„ Sort Computation, Frontiers 88 Confer- 
ence Proceedings, September 1988. 

2 Dorband, John E., Sort Computation and Conservative 
ImageRegistration.Ph.D. thesis, Pennsylvania State Umv., 

December 1985. 

3 Dorband, John E„ 3-D Graphic Generation on the MPP, 
Proceedings of the 2nd International Conference on Super- 
computing, Vol. II, pg 305-309, 1987. 


329 



